Skip to content

Skip numeric drop-out when PComputeWindow is a null_tile_window in Bl…#7256

Merged
qianfengz merged 2 commits into
developfrom
users/qianfengz/ck/block_dropout_no_drop
May 13, 2026
Merged

Skip numeric drop-out when PComputeWindow is a null_tile_window in Bl…#7256
qianfengz merged 2 commits into
developfrom
users/qianfengz/ck/block_dropout_no_drop

Conversation

@qianfengz
Copy link
Copy Markdown
Contributor

The BlockDropout implementation already provides very complete logic for generating random numbers and executing dropout for the P tensor after first attention Gemm with capability to support both Warp-Gemm 32x32 and 16x16 as well as to run on both wave32 and wave64 arch.

But in some situation, we only need the block-layer process to generate random numbers, rather than simultaneously execute dropout in real-time on the vgpr tile. For example, xformers' test_mem_eff_attention.py::test_dropout_ck requires the host reference implementation of attention forward with dropout to use the same random numbers to compare & verify the device side implementation of attention forward with dropout, so a standalone kernel to generate random numbers only is required.

This PR will enable xformers's random_val generating kernel (in file ck_tiled_rand_uniform_kernel.h) to depend on BlockDropout's Run() operator completely to generate random numbers for a [MPerBlock, NPerBlock] tile during the tile iteration, no need to replicate the logic of BlockDropout in the xformers kernel

Comment thread projects/composablekernel/include/ck_tile/ops/fmha/block/block_dropout.hpp Outdated
@qianfengz qianfengz enabled auto-merge (squash) May 12, 2026 09:23
@qianfengz qianfengz merged commit 1fc20eb into develop May 13, 2026
43 of 48 checks passed
@qianfengz qianfengz deleted the users/qianfengz/ck/block_dropout_no_drop branch May 13, 2026 09:41
assistant-librarian Bot pushed a commit to ROCm/composable_kernel that referenced this pull request May 13, 2026
=?UTF-8?q?Skip=20numeric=20drop-out=20when=20PComputeWind?=
 =?UTF-8?q?ow=20is=20a=20null=5Ftile=5Fwindow=20in=20Bl=E2=80=A6=20(#7256)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The BlockDropout implementation already provides very complete logic for
generating random numbers and executing dropout for the P tensor after
first attention Gemm with capability to support both Warp-Gemm 32x32 and
16x16 as well as to run on both wave32 and wave64 arch.

But in some situation, we only need the block-layer process to generate
random numbers, rather than simultaneously execute dropout in real-time
on the vgpr tile. For example, xformers'
`test_mem_eff_attention.py::test_dropout_ck` requires the host reference
implementation of `attention forward with dropout` to use the same
random numbers to compare & verify the device side implementation of
`attention forward with dropout`, so a standalone kernel to generate
random numbers only is required.

This PR will enable xformers's random_val generating kernel (in file
`ck_tiled_rand_uniform_kernel.h`) to depend on BlockDropout's `Run()`
operator completely to generate random numbers for a `[MPerBlock,
NPerBlock]` tile during the tile iteration, no need to replicate the
logic of BlockDropout in the xformers kernel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants